Overview

Dataset statistics

Number of variables10
Number of observations710
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory128.8 KiB
Average record size in memory185.8 B

Variable types

DateTime1
Numeric7
Categorical2

Alerts

HML is highly correlated with Mkt_RF and 3 other fieldsHigh correlation
CMA is highly correlated with Mkt_RF and 2 other fieldsHigh correlation
Mkt_RF is highly correlated with SMB and 4 other fieldsHigh correlation
SMB is highly correlated with Mkt_RF and 3 other fieldsHigh correlation
RMW is highly correlated with SMB and 3 other fieldsHigh correlation
MOM is highly correlated with SMB and 1 other fieldsHigh correlation
Best is highly correlated with Mkt_RF and 1 other fieldsHigh correlation
Worst is highly correlated with Mkt_RF and 1 other fieldsHigh correlation
Date has unique values Unique
RF has 69 (9.7%) zeros Zeros

Reproduction

Analysis started2022-10-11 15:44:41.134132
Analysis finished2022-10-11 15:44:43.972958
Duration2.84 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

Date
Date

UNIQUE

Distinct710
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
Minimum1963-07-01 00:00:00
Maximum2022-08-01 00:00:00
2022-10-11T11:44:44.010282image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:44.063960image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Mkt_RF
Real number (ℝ)

HIGH CORRELATION

Distinct566
Distinct (%)79.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.005572535211
Minimum-0.2324
Maximum0.161
Zeros1
Zeros (%)0.1%
Negative285
Negative (%)40.1%
Memory size5.7 KiB
2022-10-11T11:44:44.120876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.2324
5-th percentile-0.072375
Q1-0.019675
median0.00915
Q30.034
95-th percentile0.070895
Maximum0.161
Range0.3934
Interquartile range (IQR)0.053675

Descriptive statistics

Standard deviation0.04477172841
Coefficient of variation (CV)8.034355408
Kurtosis1.83251363
Mean0.005572535211
Median Absolute Deviation (MAD)0.02695
Skewness-0.5032989172
Sum3.9565
Variance0.002004507665
MonotonicityNot monotonic
2022-10-11T11:44:44.169306image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.01383
 
0.4%
-0.01443
 
0.4%
0.01033
 
0.4%
0.0143
 
0.4%
0.00783
 
0.4%
-0.02293
 
0.4%
0.06933
 
0.4%
0.01433
 
0.4%
0.03113
 
0.4%
0.02063
 
0.4%
Other values (556)680
95.8%
ValueCountFrequency (%)
-0.23241
0.1%
-0.17231
0.1%
-0.16081
0.1%
-0.13391
0.1%
-0.1291
0.1%
-0.12751
0.1%
-0.11911
0.1%
-0.11771
0.1%
-0.111
0.1%
-0.10721
0.1%
ValueCountFrequency (%)
0.1611
0.1%
0.13661
0.1%
0.13651
0.1%
0.12472
0.3%
0.12161
0.1%
0.11351
0.1%
0.1131
0.1%
0.11141
0.1%
0.10841
0.1%
0.10281
0.1%

SMB
Real number (ℝ)

HIGH CORRELATION

Distinct510
Distinct (%)71.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.00227056338
Minimum-0.1535
Maximum0.1834
Zeros2
Zeros (%)0.3%
Negative340
Negative (%)47.9%
Memory size5.7 KiB
2022-10-11T11:44:44.220560image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.1535
5-th percentile-0.042955
Q1-0.015175
median0.001
Q30.02035
95-th percentile0.04914
Maximum0.1834
Range0.3369
Interquartile range (IQR)0.035525

Descriptive statistics

Standard deviation0.03024648984
Coefficient of variation (CV)13.32113876
Kurtosis3.135512646
Mean0.00227056338
Median Absolute Deviation (MAD)0.018
Skewness0.3422554032
Sum1.6121
Variance0.0009148501478
MonotonicityNot monotonic
2022-10-11T11:44:44.267765image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.00135
 
0.7%
0.02714
 
0.6%
-0.01394
 
0.6%
-0.01144
 
0.6%
0.00314
 
0.6%
-0.00624
 
0.6%
-0.00413
 
0.4%
0.01923
 
0.4%
-0.01073
 
0.4%
-0.00053
 
0.4%
Other values (500)673
94.8%
ValueCountFrequency (%)
-0.15351
0.1%
-0.10021
0.1%
-0.08311
0.1%
-0.08071
0.1%
-0.07281
0.1%
-0.06931
0.1%
-0.06911
0.1%
-0.06821
0.1%
-0.06451
0.1%
-0.06431
0.1%
ValueCountFrequency (%)
0.18341
0.1%
0.12911
0.1%
0.10411
0.1%
0.09931
0.1%
0.09181
0.1%
0.0911
0.1%
0.08511
0.1%
0.07991
0.1%
0.07611
0.1%
0.07541
0.1%

HML
Real number (ℝ)

HIGH CORRELATION

Distinct498
Distinct (%)70.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.00298084507
Minimum-0.1397
Maximum0.1275
Zeros0
Zeros (%)0.0%
Negative326
Negative (%)45.9%
Memory size5.7 KiB
2022-10-11T11:44:44.317168image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.1397
5-th percentile-0.041
Q1-0.013875
median0.00245
Q30.0175
95-th percentile0.05401
Maximum0.1275
Range0.2672
Interquartile range (IQR)0.031375

Descriptive statistics

Standard deviation0.02966002661
Coefficient of variation (CV)9.950207377
Kurtosis2.379070988
Mean0.00298084507
Median Absolute Deviation (MAD)0.0158
Skewness0.1068899396
Sum2.1164
Variance0.0008797171784
MonotonicityNot monotonic
2022-10-11T11:44:44.367664image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.00857
 
1.0%
0.01174
 
0.6%
0.00154
 
0.6%
0.01754
 
0.6%
-0.00024
 
0.6%
-0.00134
 
0.6%
0.00434
 
0.6%
0.01194
 
0.6%
0.02274
 
0.6%
-0.02763
 
0.4%
Other values (488)668
94.1%
ValueCountFrequency (%)
-0.13971
0.1%
-0.11291
0.1%
-0.09871
0.1%
-0.0971
0.1%
-0.08431
0.1%
-0.08331
0.1%
-0.08321
0.1%
-0.07821
0.1%
-0.07661
0.1%
-0.06951
0.1%
ValueCountFrequency (%)
0.12751
0.1%
0.12481
0.1%
0.12321
0.1%
0.08631
0.1%
0.08411
0.1%
0.0831
0.1%
0.08281
0.1%
0.08191
0.1%
0.08171
0.1%
0.07631
0.1%

RMW
Real number (ℝ)

HIGH CORRELATION

Distinct446
Distinct (%)62.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.002724225352
Minimum-0.1873
Maximum0.1309
Zeros1
Zeros (%)0.1%
Negative309
Negative (%)43.5%
Memory size5.7 KiB
2022-10-11T11:44:44.417579image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.1873
5-th percentile-0.027485
Q1-0.007875
median0.0024
Q30.013075
95-th percentile0.03471
Maximum0.1309
Range0.3182
Interquartile range (IQR)0.02095

Descriptive statistics

Standard deviation0.02215376522
Coefficient of variation (CV)8.132133858
Kurtosis11.54272923
Mean0.002724225352
Median Absolute Deviation (MAD)0.0106
Skewness-0.2997310721
Sum1.9342
Variance0.0004907893136
MonotonicityNot monotonic
2022-10-11T11:44:44.465888image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0037
 
1.0%
0.0135
 
0.7%
0.00275
 
0.7%
0.01315
 
0.7%
-0.00685
 
0.7%
0.02044
 
0.6%
0.00084
 
0.6%
-0.01374
 
0.6%
0.00934
 
0.6%
-0.00424
 
0.6%
Other values (436)663
93.4%
ValueCountFrequency (%)
-0.18731
0.1%
-0.09211
0.1%
-0.08321
0.1%
-0.0761
0.1%
-0.07061
0.1%
-0.06311
0.1%
-0.0481
0.1%
-0.0472
0.3%
-0.04621
0.1%
-0.04441
0.1%
ValueCountFrequency (%)
0.13091
0.1%
0.11821
0.1%
0.0961
0.1%
0.09111
0.1%
0.08061
0.1%
0.07661
0.1%
0.07421
0.1%
0.07221
0.1%
0.06461
0.1%
0.06291
0.1%

CMA
Real number (ℝ)

HIGH CORRELATION

Distinct443
Distinct (%)62.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.002843380282
Minimum-0.0694
Maximum0.0905
Zeros1
Zeros (%)0.1%
Negative329
Negative (%)46.3%
Memory size5.7 KiB
2022-10-11T11:44:44.515595image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.0694
5-th percentile-0.026555
Q1-0.01
median0.00095
Q30.0149
95-th percentile0.03681
Maximum0.0905
Range0.1599
Interquartile range (IQR)0.0249

Descriptive statistics

Standard deviation0.02039952998
Coefficient of variation (CV)7.174393842
Kurtosis1.426598638
Mean0.002843380282
Median Absolute Deviation (MAD)0.01255
Skewness0.3021728958
Sum2.0188
Variance0.0004161408235
MonotonicityNot monotonic
2022-10-11T11:44:44.562451image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.00346
 
0.8%
0.00095
 
0.7%
0.0095
 
0.7%
-0.00045
 
0.7%
0.00845
 
0.7%
-0.0124
 
0.6%
0.00464
 
0.6%
-0.0164
 
0.6%
-0.00334
 
0.6%
-0.00954
 
0.6%
Other values (433)664
93.5%
ValueCountFrequency (%)
-0.06941
0.1%
-0.06771
0.1%
-0.06621
0.1%
-0.05831
0.1%
-0.05661
0.1%
-0.05631
0.1%
-0.051
0.1%
-0.04741
0.1%
-0.0471
0.1%
-0.04541
0.1%
ValueCountFrequency (%)
0.09051
0.1%
0.08391
0.1%
0.07711
0.1%
0.06561
0.1%
0.06461
0.1%
0.06211
0.1%
0.05921
0.1%
0.05911
0.1%
0.05891
0.1%
0.05651
0.1%

MOM
Real number (ℝ)

HIGH CORRELATION

Distinct539
Distinct (%)75.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.006294225352
Minimum-0.343
Maximum0.182
Zeros1
Zeros (%)0.1%
Negative264
Negative (%)37.2%
Memory size5.7 KiB
2022-10-11T11:44:44.612730image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.343
5-th percentile-0.06532
Q1-0.009525
median0.00735
Q30.028975
95-th percentile0.064205
Maximum0.182
Range0.525
Interquartile range (IQR)0.0385

Descriptive statistics

Standard deviation0.0419055474
Coefficient of variation (CV)6.657776781
Kurtosis9.952008228
Mean0.006294225352
Median Absolute Deviation (MAD)0.0192
Skewness-1.283579652
Sum4.4689
Variance0.001756074903
MonotonicityNot monotonic
2022-10-11T11:44:44.662529image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.03165
 
0.7%
0.00225
 
0.7%
0.00044
 
0.6%
0.00864
 
0.6%
-0.00584
 
0.6%
0.0093
 
0.4%
0.02523
 
0.4%
0.04453
 
0.4%
0.03033
 
0.4%
-0.01843
 
0.4%
Other values (529)673
94.8%
ValueCountFrequency (%)
-0.3431
0.1%
-0.2531
0.1%
-0.16331
0.1%
-0.13821
0.1%
-0.12491
0.1%
-0.12431
0.1%
-0.11871
0.1%
-0.11571
0.1%
-0.1071
0.1%
-0.09551
0.1%
ValueCountFrequency (%)
0.1821
0.1%
0.1661
0.1%
0.15221
0.1%
0.13221
0.1%
0.12751
0.1%
0.12571
0.1%
0.11481
0.1%
0.10381
0.1%
0.09981
0.1%
0.09641
0.1%

RF
Real number (ℝ≥0)

ZEROS

Distinct106
Distinct (%)14.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.003626338028
Minimum0
Maximum0.0135
Zeros69
Zeros (%)9.7%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2022-10-11T11:44:44.710912image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.0014
median0.0038
Q30.0051
95-th percentile0.0081
Maximum0.0135
Range0.0135
Interquartile range (IQR)0.0037

Descriptive statistics

Standard deviation0.002682255718
Coefficient of variation (CV)0.7396595953
Kurtosis0.6327734726
Mean0.003626338028
Median Absolute Deviation (MAD)0.00175
Skewness0.6596971813
Sum2.5747
Variance7.194495739 × 10-6
MonotonicityNot monotonic
2022-10-11T11:44:44.762951image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
069
 
9.7%
0.000144
 
6.2%
0.004321
 
3.0%
0.00421
 
3.0%
0.004218
 
2.5%
0.004618
 
2.5%
0.003916
 
2.3%
0.003116
 
2.3%
0.004416
 
2.3%
0.003715
 
2.1%
Other values (96)456
64.2%
ValueCountFrequency (%)
069
9.7%
0.000144
6.2%
0.00028
 
1.1%
0.00034
 
0.6%
0.00042
 
0.3%
0.00051
 
0.1%
0.00065
 
0.7%
0.00076
 
0.8%
0.00087
 
1.0%
0.00097
 
1.0%
ValueCountFrequency (%)
0.01351
 
0.1%
0.01311
 
0.1%
0.01281
 
0.1%
0.01261
 
0.1%
0.01242
0.3%
0.01213
0.4%
0.01151
 
0.1%
0.01131
 
0.1%
0.01081
 
0.1%
0.01072
0.3%

Best
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size42.4 KiB
Mkt_RF
215 
MOM
165 
HML
104 
SMB
96 
RMW
70 
Other values (2)
60 

Length

Max length6
Median length3
Mean length3.902816901
Min length2

Characters and Unicode

Total characters2771
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMOM
2nd rowMkt_RF
3rd rowCMA
4th rowMOM
5th rowCMA

Common Values

ValueCountFrequency (%)
Mkt_RF215
30.3%
MOM165
23.2%
HML104
14.6%
SMB96
13.5%
RMW70
 
9.9%
CMA56
 
7.9%
RF4
 
0.6%

Length

2022-10-11T11:44:44.809498image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T11:44:44.859382image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
mkt_rf215
30.3%
mom165
23.2%
hml104
14.6%
smb96
13.5%
rmw70
 
9.9%
cma56
 
7.9%
rf4
 
0.6%

Most occurring characters

ValueCountFrequency (%)
M871
31.4%
R289
 
10.4%
F219
 
7.9%
k215
 
7.8%
t215
 
7.8%
_215
 
7.8%
O165
 
6.0%
H104
 
3.8%
L104
 
3.8%
S96
 
3.5%
Other values (4)278
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2126
76.7%
Lowercase Letter430
 
15.5%
Connector Punctuation215
 
7.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M871
41.0%
R289
 
13.6%
F219
 
10.3%
O165
 
7.8%
H104
 
4.9%
L104
 
4.9%
S96
 
4.5%
B96
 
4.5%
W70
 
3.3%
C56
 
2.6%
Lowercase Letter
ValueCountFrequency (%)
k215
50.0%
t215
50.0%
Connector Punctuation
ValueCountFrequency (%)
_215
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2556
92.2%
Common215
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
M871
34.1%
R289
 
11.3%
F219
 
8.6%
k215
 
8.4%
t215
 
8.4%
O165
 
6.5%
H104
 
4.1%
L104
 
4.1%
S96
 
3.8%
B96
 
3.8%
Other values (3)182
 
7.1%
Common
ValueCountFrequency (%)
_215
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2771
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M871
31.4%
R289
 
10.4%
F219
 
7.9%
k215
 
7.8%
t215
 
7.8%
_215
 
7.8%
O165
 
6.0%
H104
 
3.8%
L104
 
3.8%
S96
 
3.5%
Other values (4)278
 
10.0%

Worst
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size42.2 KiB
Mkt_RF
178 
SMB
122 
HML
122 
MOM
116 
RMW
94 
Other values (2)
78 

Length

Max length6
Median length3
Mean length3.743661972
Min length2

Characters and Unicode

Total characters2658
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCMA
2nd rowSMB
3rd rowMkt_RF
4th rowCMA
5th rowSMB

Common Values

ValueCountFrequency (%)
Mkt_RF178
25.1%
SMB122
17.2%
HML122
17.2%
MOM116
16.3%
RMW94
13.2%
CMA72
10.1%
RF6
 
0.8%

Length

2022-10-11T11:44:44.904606image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T11:44:44.952229image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
mkt_rf178
25.1%
smb122
17.2%
hml122
17.2%
mom116
16.3%
rmw94
13.2%
cma72
10.1%
rf6
 
0.8%

Most occurring characters

ValueCountFrequency (%)
M820
30.9%
R278
 
10.5%
F184
 
6.9%
k178
 
6.7%
t178
 
6.7%
_178
 
6.7%
S122
 
4.6%
B122
 
4.6%
H122
 
4.6%
L122
 
4.6%
Other values (4)354
13.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2124
79.9%
Lowercase Letter356
 
13.4%
Connector Punctuation178
 
6.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M820
38.6%
R278
 
13.1%
F184
 
8.7%
S122
 
5.7%
B122
 
5.7%
H122
 
5.7%
L122
 
5.7%
O116
 
5.5%
W94
 
4.4%
C72
 
3.4%
Lowercase Letter
ValueCountFrequency (%)
k178
50.0%
t178
50.0%
Connector Punctuation
ValueCountFrequency (%)
_178
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2480
93.3%
Common178
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
M820
33.1%
R278
 
11.2%
F184
 
7.4%
k178
 
7.2%
t178
 
7.2%
S122
 
4.9%
B122
 
4.9%
H122
 
4.9%
L122
 
4.9%
O116
 
4.7%
Other values (3)238
 
9.6%
Common
ValueCountFrequency (%)
_178
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2658
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M820
30.9%
R278
 
10.5%
F184
 
6.9%
k178
 
6.7%
t178
 
6.7%
_178
 
6.7%
S122
 
4.6%
B122
 
4.6%
H122
 
4.6%
L122
 
4.6%
Other values (4)354
13.3%

Interactions

2022-10-11T11:44:43.529140image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.435446image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.857096image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.180613image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.492718image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.808163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.128850image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.578580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.484585image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.905856image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.230333image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.539651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.856451image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.174797image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.625539image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.532364image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.952512image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.274131image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.586631image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.902345image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.219232image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.670015image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.577789image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.995507image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.316835image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.630638image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.946590image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.260769image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.715809image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.625543image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.041667image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.361305image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.674389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.992502image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.303603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.762801image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.675126image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.088515image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.406359image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.719755image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.038730image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.347712image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.806633image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:41.722994image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.132094image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.447878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:42.762388image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.082628image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T11:44:43.388811image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-10-11T11:44:45.116614image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-11T11:44:45.165216image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-11T11:44:45.212661image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-11T11:44:45.257212image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-11T11:44:45.295081image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-11T11:44:43.878737image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-11T11:44:43.949899image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

DateMkt_RFSMBHMLRMWCMAMOMRFBestWorst
01963-07-01-0.0039-0.0041-0.00970.0068-0.01180.00900.0027MOMCMA
11963-08-010.0507-0.00800.01800.0036-0.00350.01010.0025Mkt_RFSMB
21963-09-01-0.0157-0.00520.0013-0.00710.00290.00190.0027CMAMkt_RF
31963-10-010.0253-0.0139-0.00100.0280-0.02010.03120.0029MOMCMA
41963-11-01-0.0085-0.00880.0175-0.00510.0224-0.00740.0027CMASMB
51963-12-010.0183-0.0210-0.00020.0003-0.00070.01750.0029Mkt_RFSMB
61964-01-010.02240.00130.01480.00170.01470.00860.0030Mkt_RFSMB
71964-02-010.01540.00280.0281-0.00050.00910.00260.0026HMLRMW
81964-03-010.01410.01230.0340-0.02210.03220.00750.0031HMLRMW
91964-04-010.0010-0.0152-0.0067-0.0127-0.0108-0.00580.0029RFSMB

Last rows

DateMkt_RFSMBHMLRMWCMAMOMRFBestWorst
7002021-11-01-0.0155-0.0176-0.00440.07220.01740.00900.0000RMWSMB
7012021-12-010.0310-0.00770.03280.04920.0443-0.02600.0001RMWMOM
7022022-01-01-0.0625-0.04050.12750.00870.0771-0.02590.0000HMLMkt_RF
7032022-02-01-0.02290.02960.0304-0.02080.03130.01760.0000CMAMkt_RF
7042022-03-010.0305-0.0215-0.0180-0.01560.03170.03000.0001CMASMB
7052022-04-01-0.0946-0.00400.06190.03630.05920.04890.0001HMLMkt_RF
7062022-05-01-0.0034-0.00060.08410.01440.03980.02480.0003HMLMkt_RF
7072022-06-01-0.08430.0130-0.05970.0185-0.04700.00790.0006RMWMkt_RF
7082022-07-010.09570.0187-0.04100.0068-0.0694-0.03960.0008Mkt_RFCMA
7092022-08-01-0.03780.01510.0031-0.04800.01310.02090.0019MOMRMW